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Abstract. In this paper we consider two points of views to the problem 
of coherent integration of distributed data. First we give a pure model- 
theoretic analysis of the possible ways to 'repair' a database. We do so 
by characterizing the possibilities to 'recover' consistent data from an in- 
consistent database in terms of those models of the database that exhibit 
as minimal inconsistent information as reasonably possible. Then we in- 
troduce an abductive application to restore the consistency of a given 
database. This application is based on an abductive solver (^4-system) 
that implements an SLDNFA-resolution procedure, and computes a list 
of data-facts that should be inserted to the database or retracted from it 
in order to keep the database consistent. The two approaches for coherent 
data integration are related by soundness and completeness results. 



1 Introduction 

Integration of data coming from different databases is a very common, never- 
theless nontrivial, task. There are a number of different phases involved in this 
process, the most important of which are the following: 

1. Resolving the different ontologies and/or database scheme, setting a single 
unified schema, and translating the integrity constraints^] of each database 
to the new ontology. 

2. Resolving contradictions among the integrity constraints of different local 
databases. 

3. Integrating distributed databases w.r.t. the unified set of integrity constraints, 
computed in the previous phase. 



* Originally published in proc. PCL 2002, a FLoC workshop; eds. Hendrik Decker , Dina 

Goldin, J0rgen Villadsen, Toshiharu Waragai ( ittp : //f loc02 . diku. dk/PCL/). 
1 I.e., the rules that represent intentional truths of a database domain. 
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Each one of the phases mentioned above has its own difficulties and chal- 
lenges. For instance, we are not aware of any work that gives a complete and 
robust solution to the problem of the first phase. Most of the formalisms for 
database integration implicitly assume that all the databases to be integrated 
have the same ontology, so the first phase is not needed. 

The reason for separating the remaining two phases is that integrity con- 
straints represent truths that should be valid in all situations, while a database 
instance represents an existentional truth, i.e., an actual situation. Consequently, 
the policy of resolving contradictions among integrity constraints is often differ- 
ent than the one that is applied on database facts, and the former should be 
applied first. 

Despite their different nature, both these phases are based on some for- 
malisms that maintain contradictions and allow to draw plausible conclusions 
from inconsistent situations. Roughly, there are two approaches to handle this 
problem: 

— Paraconsistent formalisms, in which the amalgamated data may remain in- 
consistent, but the set of conclusions implied by it is not explosive, i.e.: not 
every fact follows from an inconsistent database. Paraconsistent procedures 
for integrating data (e.g., |]l^ , ^l| ) are often based on a paraconsistent reason- 
ing process, such as LFI |L3f , annotated logics J3(jfl0]], or other non-classical 
proof systems jspT^ . 

— Coherent (consistency-base) methods, in which the amalgamated data is 
revised in order to restore consistency (see, e.g., j6|,p| JIl] , |25| , ^]| ) . In many 
cases the underlying formalism of these approaches are closely related to 
the theory of belief revision |p],p3|. In the context of database systems the 
idea is to construct consistent databases that are "as close as possible" to 
the original database. These "repaired" instances of the spoiled database 
correspond to plausible and compact ways of restoring consistency. 

In this paper we follow the latter approach, and consider two points of views 
for the last phase of the process, namely: coherent methods of integrating dis- 
tributed databases (with the same ontology) w.r.t. a consistent set of integrity 
constraints. The main difficulty in this process stems from the fact that even 
when each local database is consistent, the collective information of all the dis- 
tributed databases may not be consistent anymore. In particular, facts that are 
specified in a particular database may violate some integrity constraints defined 
elsewhere, and so it might contradict some elements in the unified set of integrity 
constraints. Our goal is therefore to find ways to properly "repair" a combined 
database, and restore its consistency. 

One way of viewing this problem is by a model-theoretic analysis that char- 
acterizes database repairs in terms of a certain set of models of the inconsistent 
database (those that, intuitively, minimize the amount of inconsistent informa- 
tion). The other approach is based on abductive reasoning. For this we use an 
abductive solver (.4-system, ^7j) that implements SLDNFA-resolution |f6| , |l7t 
for computing a list of data-facts that should be inserted to the database or re- 
tracted from it in order to keep the data consistent. A corresponding application 
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was introduced and described in greater details in |7J| . Here we review it in order 
to keep this paper self contained, and putting our results in the right context. 
We then show that the abductive process of coherent integration of databases is 
sound and complete w.r.t. the semantics that is induced by the model theoretic 
analysis. [] 

2 Coherent integration of databases 

In this paper we assume that we have a first-order language L, based on a fixed 
database schema S, and a fixed domain D. Every element of D has a unique 
name. A database instance T> consists of atoms in the language L that are 
instances of the schema S. As such, every instance T> has a finite active domain, 
which is a subset of D. A database is a pair (D, TC), where T> is a database 
instance, and TC, the set of integrity constraints , is a finite set of formulae in L 
(assumed to be satisfied by V). 

Given a database T>B—{T>, TC), we apply to it the closed word assumption, 
so only the facts that are explicitly mentioned in T> are considered true. The un- 
derlying semantics corresponds, therefore, to minimal Herbrand interpretations. 

Definition 1. The minimal Herbrand model TiP of a database instance T> is 
the model of T> that assigns true to all the ground instances of atomic formulae 
in V, and false to all the other atoms. 

Definition 2. A formula ip follows from a database instance T> (notation: T> \= 
ip) if the minimal Herbrand model of T> is also a model of if). 

Definition 3. A database VB=(V, TC) is consistent HXC is a classically con- 
sistent set, and each formula of it follows from V (notation: V \= TC). 

Our goal is to integrate n consistent databases, T>Bi — (£>;, ZCj), i = l, ■ ■ .n, in 
such a way that the combined data will contain everything that can be deduced 
from one source of information, without violating any integrity constraint of 
another source. The idea is to consider the union of the distributed data, and 
then to restore its consistency. A key notion in this respect is the following: 

Definition 4. A repair of T>B=(T>, TC) is a pair (Insert, Retract) such that (1) 
lnsertn£> = 0, (2) Retract C £>|]and (3) (V U Insert \ Retract, TC) is a consistent 
database. 

Intuitively, Insert is a set of elements that should be inserted into V and 
Retract is a set of elements that should be removed from T> in order to obtain a 
consistent database. 

2 Due to a lack of space some proofs are reduced or omitted altogether. Full proofs 
will appear in an extended version of this paper. 

3 Note that by conditions (I) and (2) it follows that Insert n Retract = 0. 
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Definition 5. A repaired database of VB = (V, TC) is a consistent database 
(V U Insert \ Retract , TC), where (Insert, Retract) is a repair of VB. 

As there may be many ways to repair an inconsistent database,^ it is often 
convenient to make preferences among the possible repairs, and consider only 
the most preferred ones. Below are two common preference criteria. 

Definition 6. Let (Insert, Retract) and (Insert', Retract') be two repairs. 

— set inclusion preference criterion : (Insert', Retract') <i (Insert, Retract), if 
Insert C Insert' and Retract C Retract'. 

— cardinality preference criterion: (Insert', Retract') < c (Insert, Retract) if |lnsert| + 
|Retract| < |lnsert'| + | Retract' |. 

In what follows we assume that < is a fixed pre-order that represents some 
preference criterion on the set of repairs. 

Definition 7. A <-preferred repair of VB is a repair (Insert, Retract) of VB, 
s.t. for every repair (Insert', Retract') of VB, if (Insert, Retract) < (Insert', Retract') 
then (Insert', Retract') < (Insert, Retract). The set of all the <-preferred repairs of 
VB is denoted by \{VB,<). 

Definition 8. A <-repaired database of VB is a repaired database of VB, con- 
structed from a <-preferred repair of VB. The set of all the <-repaired databases 
of VB is denoted by 
K(VB, <) = {(DU Insert \ Retract , TC) | (Insert, Retract) e (VB, <) }. 

Note that if VB is consistent, and the preference criterion is a partial order 
that is monotonic in the total size of the repairs' components (as in Def. [|), then 
TZiVB, <) = {VB}, so there is nothing to repair, as expected. 

It is usual to refer to the <-preferred databases of VB as the consistent 
databases that are 'as close as possible' to VB itself (see, e.g., J|,[l4]j3lJ] ) . Indeed, 
denote Th(V) — {P(t) | V \= P(t)}, where P is a relation name and t is a ground 
tuple, and let dist(2?i, V 2 ) be the following set: 

dist(X>i, V 2 ) = {Th{V x ) \ Th{V 2 )) U (Th(V 2 ) \ Th{V{)) 

It is easy to see that VB 1 = (V ,IC) is a <i-repaired database of VB = (V, TC), 
if the set dist(Z?',X>) is minimal (w.r.t. set inclusion) among all the sets of the 
form dist(£>", V), where V" ^ TC. Similarly, if #(5) denotes the number of 
elements in S, then VB' = (V',TC) is a < c -repaired database of VB = (V, TC), 
if #(dist (£>',£>) is minimal in {#(dist(£>", V)) | V" ^TC}. 

Definition 9. For VB l = {V u TC,), i = l,...n, let UVB = (£>, TC), where 

g = uu ^ and ic = ur=i 

4 Some of them may be trivial and/or useless. For instance, the inconsistency in 
(T>, TC) — ({p, q, r}, {~>p}) may be removed by deleting every element in T>, but 
this is certainly not the optimal way of restoring consistency in this case. 

5 Set inclusion is also considered in HnWilM ; cardinality is considered, e.g., in BIJ 
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Given n distributed databases and a preference criterion <, our goal is to 
compute the set 1Z(UT>B, <) of the <-repaired databases of UVB (or to be able 
to compute, in an efficient way, some elements in this set). Below are test-cases 
for such database integration. 

Example 1. Consider a distributed database with a relation teaches of the fol- 
lowing scheme: (course_name, teacher_name). Suppose also that each database 
contains a single integrity constraint, stating that the same course cannot be 
taught by two different teachers: 

IC = { VXVFVZ (teaches(X, Y) A teaches(X, Z) Y = Z) }. 

Consider now the following two databases: 

DB\ = ({teaches(ci,ni), teaches(c2,ri2)}, IC), 

£>£>2 = ( {teaches(c2, ^3)}, IC) 
Clearly, the unified database VB\ U VB2 is inconsistent. Its preferred repairs 
are (0, {teaches(c2,n 2 )}) and (0, {teaches(c.2, n 3 )}). Hence, the two repaired 
databases are the following: 

IZi = ({teaches(ci,ni), teaches(c2,ri2)}, IC), 
IZ2 = [{teaches{c.\,n\), teaches(c2,n 3 )}, IC). 



Example 2. Let 2?i = {p(a), p(b)},V 2 = {q(a), q(c)}, and IC = {VX(p(X) -> 
q(X))}. Again, (2?i,0) U (T>2,IC) is inconsistent. The corresponding preferred 
repairs are ({(?(&)}, 0) and (0, {p(b)}). The repaired databases are 1Zi — 
( {p( a ), P( b ), q(a), q( b ), q(c)}, IC ) and TZ 2 = ( {p(a), q(a), q(c)}, IC). 



3 Database repair — A model-theoretic point of view 

In this section we characterize the repairs of a given database in terms of its 
models. First, we consider arbitrary repairs, and show that they can be rep- 
resented either by two-valued models of the theory of integrity constraints, or 
by three-valued models of the set of integrity constraints and the set of literals, 
obtained by applying the closed world assumption on the database facts. Then 
we focus on the most preferred repairs, and show that a certain subset of the 
three- valued models considered above can be used for characterizing <-preferred 
repairs. 

Definition 10. Given a valuation v and a truth value x. Denote: 
v x = {p I p is an atomic formula, and v{p) = x}. ^| 

6 See, e.g., f^, |ll| , p5t for more discussions on the examples below. 

7 In all the following examples we use set inclusion as the preference criterion. In what 
follows we shall fix a preference criterion for choosing the "best" repairs and omit 
its notation whenever possible. 

8 Note, in particular, that (Tt^Y = V. 
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The following two propositions characterize repairs in terms of two-valued 
structures. 

Proposition 1. Let (T>, XC) be a database and let M be a two-valued model of 
IC. Let Insert = M l \ V and Retract = T> \ M*. Then (Insert, Retract) is a repair 
of(V,lC). 

Proof: The definitions of Insert and Retract immediately imply that lnsertn£> = 
and Retract CP. For the the last condition in Definition |], note that in our case 
V U Insert \ Retract = DU (M* \V)\(D\ M* ) = M*. It follows that M is the least 
Hcrbrand model of V U Insert \ Retract and it is also a model of IC, therefore 
DU Insert \ Retract ^IC. □ 

Proposition 2. Let (Insert, Retract) be a repair of a database (T>, IC). Then 
there is a classical model M o/ZC,[] such that Insert = M t \ V and Retract = 
T> \ M*. 

Proof: Consider a valuation M , defined for every atom p as follows: 

't if peV U Insert \ Retract, 



M(p) 



f otherwise. 



By its definition, M is a minimal Herbrand model of V U Insert \ Retract. Now, 
since (Insert, Retract) is a repair of (T>, IC), we have that 2?Ulnsert\Retract|=2C, 
thus M is a (two-valued) model of XC. Moreover, Insert n"D = and Retract C 2?, 
hence we have the following: 

• M 1 \ V = (D U Insert \ Retract) \ V = Insert, 

• D\M*=D\(X»U Insert \ Retract) = Retract. □ 



The above formalization in terms of two-valued models has the drawback 
that a unified database UT)B in need of a repair is inconsistent. In order to avoid 
reasoning on inconsistent theories, and since classical logic can infer everything 
from an inconsistent theory, we develop another formalization, based on a three- 
valued semantics. The benefit of this is that, as we show below, any database 
has models w.r.t. appropriate three-valued semantics, from which it is possible 
to pinpoint the inconsistent information, and thus it is also possible to extract 
repairs for UVB. 

The underlying 3-valued semantics considered here is induced by the alge- 
braic structure TH1Z££, shown in the double-Hasse diagram of Figure [|. In- 
tuitively, the elements t and / in TH1Z££ correspond to the usual classical 
elements true and false, while the third element, T, represents inconsistent 
information (or belief). 

Viewed horizontally, TH1Z££ is a complete lattice. We denote the meet, 
join, and the order reversing operation on the corresponding order relation (i.e., 
<t) by A, V, and -i (respectively). Viewed vertically, TH1Z££ is a semi- upper 

9 Recall that we assume that XC is classically consistent, thus it has classical models. 
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Fig. 1. The structure THKSS 



lattice. We denote by © the meet operation w.r.t. the corresponding order (<fe). 
We note that TH1Z££ is the algebraic structure that defines the semantics of 
several three- valued formalisms, such as LFI and LP p6| , [37| . 

The various semantic notions are defined on TTL1ZE8 as natural generaliza- 
tions of similar classical ones: a valuation v is a function that assigns a truth 
value in TTC1Z££ to each atomic formula. Any valuation is extended to complex 
formulae in the obvious way. The set of the designated truth values in TTL1ZEE 
(i.e., those elements in TTCR.ES that represent true assertions) consists of t and 
T. A valuation v satisfies a formula ?/> iff viip) is designated. A valuation that 
assigns a designated value to every formula in a theory T is a (three-valued) 
model of T. 

Next we characterize the repairs of a database VB by its three- valued models: 

Proposition 3. Let (£>, XC) be a database and let M be a two-valued model of 
XC. Consider the three-valued valuation N , defined for every atom p by N(p) = 
H v (p) © M(p), and let Insert = N T \ V, Retract = N T n V. Then N is a 
three-valued model ofDUXC, and (Insert, Retract) is a repair of (T>, XC). 

Proof: For the first claim, note that for three-valued valuations v and fi, if for 
every atom p, v(p) >k (J-(p), then for every formula ip, f(V0 ^fe MVO (t ne proof is 
by an easy induction on the structure of ip). We denote this fact by v>k[i. Note 
also, that if v>kl^ and /z is a model of some theory T, then v is also a model of 
T. Now, since by the definition of N, N>kTC v ', and since TiP is a model of T>, 
N is a model T>. Similarly, N>k M, and M is a model of XC, thus N is also a 
model of XC. 

For the second part one has to show that the three conditions of Defini- 
tion [| are satisfied. Indeed, the first two conditions obviously hold. For the 
last condition, note that V U Insert \ Retract = V U (7V T \ D) \ (7V T nP) = 
V U (M* \V)\ (M f n V) = V U (M* \V)\{V\ M*) = M*. It follows that M 
is the minimal Herbrand model of T> U Insert \ Retract and it is also a model of 
XC, therefore V U Insert \ Retract |= XC. □ 



Again, it is possible to show that the converse is also true: 



8 



Ofer Arieli, Marc Denecker, Bert Van Nuffelen, and Maurice Bruynooghe 



Proposition 4. Let (Insert, Retract) be a repair of a database (V, TC). Then 
there is a three-valued model N of V U TC, such that Insert = iV T \ V and 
Retract = N T n V. 

Outline of proof : Consider a valuation N, denned as follows: 

!T if pG Insert U Retract, 
t if p^ Insert U Retract but peV, 
f otherwise. 

Clearly, N is a (three- valued) model of V and TC, and iV T \ V = (Insert U 
Retract) \ V = Insert, iV T n V = (Insert U Retract) nD = Retract. □ 

The last two propositions characterize the repairs of WDB in terms of pairs 
that are associated with three- valued models of V U TC. We shall denote the 
elements of these pairs as follows: 

Definition 11. Let N be a three- valued model and let VB = (T), TC) be a 
knowledge-base. Denote: Insert^ = N T \ V and Retract^ = N T CiV. 

We conclude this model-theoretic analysis by characterizing the set of the 
<-preferred repairs, where < is one of the preference criteria, considered in Def- 
inition |^ (i.e., set inclusion or differences in cardinality). 

Definition 12. Given a knowledge-base VB = (V, IC), denote: 
M VB = {N | N > k r H D © M, M is a classical model of IC}. Q 

Example 3. In what follows we shall write M — {pi : x{\ for M(pi) — Xi (a;, G 
{t,f,T}, i = l,... ,n). Let VB = {{p,r}, {p -> q}). We have that H v = {pit, q: 
/, r : t}, and so M VB = {N \ N(p) > k t, N(q) = T, N(r) > k t} U {N N(p) = 
T,N(q)> k f, N(r)> k t}. 

Definition 13. Let S be a set of three- valued valuations, and Ni, 

— N\ is <i-more consistent than N2, if C Nj . 

— Ni is < c -more consistent than N 2 , if #{N^) < #(A r 2 T )- 

— N G S is <i-maximally consistent in S (respectively, N is < c -maximally 
consistent in S), if there is no N' G S that is <i-more consistent than N 
(respectively, no N' G<S is < c -more consistent than N). 

Proposition 5. If N is a <i-maximally consistent element in A4 VB , then 
(Insert , Retract^) is a <i-preferred repair ofVB. 



Note that TV is a three-valued valuation and M is a two-valued model of TC. 
11 Recall that #(5) denotes the size of S. 
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Proposition 6. Suppose that (Insert, Retract) is a <i-preferred repair ofVB. 
Then there is a <i-maximally consistent element N in Ad^ 13 s.t. Insert = Insert^ 
and Retract = Retract^. 



Note 1. Propositions || and ^ hold also when << is replaced by < c . 

Example 4- Consider again Example ^|. We have that: 

UVB = (V, 1C) = ( {p(o), P (b), q(a), q(c)}, {VX(p(X)^q(X))} ). 

Thus H v = {p(a) : t, p(b) : t, p(c) : f, q(a) : t, q(b) : f, q(c) : t}, and the classical 
models of IC are those in which either p(y) is false or q(y) is true for every 
y 6 {a,b,c}. Now, since in JiP neither p(b) is false nor q(b) is true, it follows 
that every element in M UVB must assign T either to p(b) or to q(b). Hence, 
the <i-maximally consistent elements in M UT ' (which in this case are also the 
< c -maximally consistent elements in A4 UVB ) are the following: 

Mr = { p(a) : t, p(b) : T, p(c) : /, q(a) : t, q(b) : /, q(c) : t } 

M 2 = { p{a) : t, p(b) : t, p(c) : /, q(a) : t, q(b) :T, q(c):t } 

By Propositions || and ||, then, the <i-preferred repairs of UVB (which are also 
its < c -preferred repairs) are (lnsert Ml , Retract 7111 ) = (0, {p(b)}) and 
(lnsert M2 , Retract M2 ) = {{q(b)}, 0) (cf. Example |). 

Similarly, the <i-maximally consistent (and the < c -maximally consistent) 
elements in Ai 1313 , where VB is the database of Example [|, are A^i = { p : t, q: 
T, r:t } and N 2 = {p:T,q:f, r:t}. It follows that the preferred repairs in this 
case are ({q}, 0) and (0, {p}). 



4 Database repair — An abductive approach 

In we have presented an abductive approach to the problem of combining 
inconsistent databases. In this section we give an outline of this method. For 
more detail ed description the reader is referred to ]7[; th e application itself is 



available at tittp : //www . cs . kuleuven . ac . be/^dtai/kt 



A high level description of the integration problem under consideration is 
given in ID-logic []l5| , which is a framework for declarative knowledge represen- 
tation that extends classical logic with inductive definitions. This logic incorpo- 
rates two types of knowledge: definitional and assertional. Assertional knowledge 
is a set of first-order statements, representing a general truth about the domain 
of discourse. Definitional knowledge is a set of rules of the form p<— B, in which 
the head p is a predicate and the body B is a first order formula. A predicate 
that appears in a head of a rule is called defined] a predicate that does not occur 
in any head is called open, or abducible. 

A theory T in ID-logic is therefore a pair (Def, Fol), where Def (the defini- 
tional knowledge) is a set of rules as described above, and Fol (the assertional 
knowledge) is a set of first order statements. The meaning of T is defined by 
the extended well-founded semantics ]35[ ] as follows: let M be an arbitrary two- 
valued interpretation for the open predicates in Def. Once M is determined, Def 
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becomes a standard logic program, with a unique well-founded model 1 42 . This 
model is then a model of the whole theory T if it is also a model of Fol. 

ID-logic is a generalization of the notion of abductive logic programs (ALP) 



18 . For instance, the open predicates of a theory in ID-logic correspond to 
the abducibles in an abductive logic program. Consequently, solutions of ab- 
ductive logic programs that are computed by an abductive solver are also mod- 
els of the corresponding ID-logic theory. Here we use such a solver, called the 
.A-system [^,^7| for computing solutions. The main idea of this solver is to reduce 
a high level specification into a lower level constraint store, which is managed 
by a constraint solver. The solver combines the refutation procedures SLDNFA 



17 and ACLP 29 , and uses an improved control strategy. In our case, solu- 
tions are repairs of a database, and in order to compute preferred solutions (i.e., 
preferred repairs for the integrated database), the „4-system has been extended 
with a simple branch and bound component, called optimizer (see [Q). This is 
actually a "filter" on the solutions space that speeds-up execution and makes 
sure that only the desired solutions will be obtained. 

The elements of the distributed databases are uniformly represented by the 
unary predicate db, and the elements of a repaired database are represented by 
the unary predicate fact. In order to compute these elements, two open pred- 
icates are used: retract and insert. These predicates represent, respectively, 
the facts that may be removed and those that may be introduced for restoring 
the consistency of the unified database. The rules for computing the elements of 
a repaired database are then defined as follows: 

fact(X) :- db(X), not retract(X). 
fact(X) :- insert (X) . 

In addition, the following integrity constraints are specified: ^ 

— It is inconsistent to have a retracted element that does not belong to some 
database: 

ic :- retract(X), not db(X) . 

— It is inconsistent to have an inserted element that belongs to a database: 
ic :- insert(X) , db(X) . 

To make sure that all the integrity constraints will hold w.r.t. the combined 
data, every occurrence of a database fact R(x) in some integrity constraint is 
replaced by f act(i?(x)). 

Below is a code for implementing Example |l|: 

def ined(f act (_) ) . def ined(db(_)) . open(insert (_) ) . open(retract (_) ) . 



fact(X) :- db(X), not (retract (X) ) . 

fact(X) :- insert (X) . 

ic :- insert (X) , db(X) . 

ic :- retract(X), not db(X) . 

In what follows we use the notation "ic :- B" to denote the denial "false <— B". 
The code for Example □ is similar. 
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db(teaches(l, 1)) . db(teaches (2 ,2) ) . 
db (teaches (2, 3)) . 

ic :- fact(teaches(X,Y)) , fact (teaches (X, Z) ) , Y\=Z. 



% Dl 
'/. D2 
I IC 



We have executed this code as well as other examples from the literature in 
our system. The soundness and completeness theorems given in the next section 
guarantee that the output in each case is indeed the set of the most preferred 
solutions of the corresponding problem. 

5 Soundness and Completeness 

In this section we relate the two approaches of the previous sections through 
soundness and completeness theorems. For that we first recall some related re- 
sults from JtJ (Propositions - [h] below) . In what follows we denote by T an 
abductive theory, constructed as described in Section 4 for defining a composi- 
tion problem of n databases T>B\, . . . , T>B n . 

Proposition 7. Every abductive solution that is obtained by the A-system for 
T is a repair ofWDB. 

Proposition 8. Suppose that the query '<— true' has a finite SLDNFA-tree 
w.r.t. T . Then every repair ofWDB is obtained by running T in the A-system . 

Proposition 9. Every output that is obtained by running T in the A-system 
together with an <i-optimizer [respectively, together with a < c - optimizer] is an 
<i-preferred repair [respectively, a < c -pref erred repair] ofWDB. 

Proposition 10. Suppose that the query '<— true' has a finite SLDNFA-tree 
w.r.t. T . Then every <i-preferred repair [respectively, every < c -preferred repair] 
ofWDB is obtained by running T in the A-system together with an <i-optimizer 
[respectively, together with a < c - optimizer]. 

By the propositions above and those of Section 3, we have: 

Corollary 1. Suppose that the query '<— true' has a finite SLDNFA refutation 
tree w.r.t. T . Then: 

1. for every output (Insert, Retract) of the A-system for T , there is a classical 
model M of IC s.t. Insert = M t \V and Retract = V\M t . 

2. for every two-valued model M of TC there is an output (Insert, Retract) of 
the A-system for T , s.t. Insert = M* \ V and Retract = V\M t . 

Corollary 2. Under the same assumption as that of Corollary 0, 



12 Ofer Arieli, Marc Denecker, Bert Van Nuffelen, and Maurice Bruynooghe 

1. for every output (Insert, Retract) of the A-system for T there is a 3-valued 
model N ofVVJlC, s.t. I nsert^ = Insert and Retract^ = Retract. 

2. for every 3-valued model N ofDUlC there is an output (Insert, Retract) of 
the A-system for T , s.t. I nsert= Insert ^ and Retract = Retract w . 

Corollary 3. In the notations of Corollary ^ and under its assumption, 

1. for every output (Insert, Retract) that is obtained by running T as an input 
to the A-system together with an <i-optimizer [respectively, together with 
a < c - optimizer], there is an <i-maximally consistent element [respectively, 
a < (.-maximally consistent element] N in J14 UVB s.t. Insert^ = Insert and 
Retract^ = Retract. 

2. for every <i-maximally consistent element [respectively, < c -maximally con- 
sistent element] N in M UT>B there is a solution (Insert, Retract) that is 
obtained by running T in the A-system together with an <j- optimizer [re- 
spectively, together with a < c -optimizer] s.t. Insert = Insert^ and Retract = 
Retract^. 



6 Related works 

Coherent integration and proper representation of amalgamated data is exten- 
sively studied in the literature (see, e.g., |Jl|,|||||||l|,||j3|||,|||l) ) . Com- 
mon approaches for dealing with this task are based on techniques of belief re- 
vision pl[ , methods of resolving contradictions by quantitative considerations 
(such as "majority vote" p2|) or qualitative ones (e.g., defining priorities on dif- 
ferent sources of information or preferring certain data over another |^,^)), and 
approaches that are based on rewriting rules for representing the information in 
a specific form p5| . As in our case, abduction is used for database updating in 
and an extended form of abduction is used in [ p6|]39| ] to explain modifications 
in a theory. 

The use of three- valued logics is also a well-known technique for maintaining 
incomplete or inconsistent information; such logics are often used for defining 
fixpoint semantics of incomplete logic programs |]l9| , [42[ , and so in principle they 
can be applied on integrity constraints in an (extended) clause form . Three- 
valued formalisms such as LF I |]l3| are also the basis of paraconsistent methods 
to construct database repairs ||l4f and are useful in general for pinpointing incon- 
sistencies (37j . As noted above, this is also the role of the three- valued semantics 
in our case. 

Other approaches are based on semantics with arbitrarily many truth values, 
which allow to decode within the language itself some "meta-information" such 
as confidence factors, amount of belief for or against a specific assertion, etc. 
These approaches combine corresponding formalisms of knowledge representa- 
tion (such as annotated logic programs fHim or bilattice-based logics |f|,||) 
together with non-classical refutation procedures |2(],|3(],[40| that allow to detect 
inconsistent parts of a database and maintain them. 
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A closely related topic is the problem of giving consistent query answers 
in inconsistent database ]|,[l(],^5|. The idea is to answer database queries in a 
consistent way without computing the repairs of the database. 

There are some other applications for integrating possibly conflicting infor- 
mation and updating databases (e.g., LUPS Q, BReLS RI [[30|, Subrahma- 
nian's mediator of annotated databases |4l|, and the system of Franconi et al. 



22 ). In comparison with such systems, we note that the main advantages of the 



present application are its expressive power (to the best of our knowledge, our 
approach is more expressive than any other available application for coherent 
data integration), the fact that no syntactical embedding of first-order formulae 
into other languages nor any extensions of two-valued semantics are necessary 
(our approach is a pure generalization of classical refutation procedures), and 
the encapsulation of the way that the underlying data is kept coherent (no input 
from the reasoner nor any other external policy for making preferences among 
conflicting sources is compulsory in order to resolve contradictions). 



7 Future work 

We conclude by sketching some issues for future work. First, as we have already 
noted, two more phases, which have not been considered here, might be needed 
for a complete data integration: (a) translation of difference concepts to a unified 
ontology, and (b) resolving contradictions among different integrity constraints. 
Another issue for future work is to allow definitions of concepts (and not only 
integrity constraints) in the databases (see jl5| for a sketch on how this may be 
done). This data may be further combined with (possibly inconsistent) tempo- 
ral information, (partial) transactions, and (contradictory) update information. 
Finally, since different databases may have different information about the same 
predicate, it is reasonable to use some weakened version of the closed word as- 
sumption as part of the integration process (for instance, an assumption that 
something is false unless it is in the database, or some other database has some 
information about it). An alternative approach may be to replace the closed 
word assumption with partial valuations (in case that databases may contain 
negative facts and not only positive ones). 
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